Submitted by:
ID1: 320983216 Name1: Maxim Kolchinsky
ID2: 321340580 Name2: Anna Romanov

The Biology

The data for this lesson comes from:
> Saigi et al. “MET-Oncogenic and JAK2-Inactivating Alterations Are Independent Factors That Affect Regulation of PD-L1 Expression in Lung Cancer” PLoS ONE. 2018 Jun 13;9(6):e99625. PMID: 24926665.

Purpose: The blockade of immune checkpoints such as PDL1 and PD-1 is being exploited therapeutically in several types of malignancies. Here, we aimed to understand the contribution of the genetics of lung cancer to the ability of tumor cells to escape immunosurveillance checkpoints. Experimental Design: More than 150 primary non-small cell lung cancers, including pulmonary sarcomatoid carcinomas, were tested for levels of the HLA-I complex, PD-L1, tumor-infiltrating CD8þ lymphocytes, and alterations in main lung cancer genes. Correlations were validated in cancer cell lines using appropriate treatments to activate or inhibit selected pathways. We also performed RNA sequencing to assess changes in gene expression after these treatments. Results: MET-oncogenic activation tended to associate with positive PD-L1 immunostaining, whereas STK11 mutations were correlated with negative immunostaining. In MET-altered cancer cells, MET triggered a transcriptional increase of PD-L1 that was independent of the IFNgmediated JAK/STAT pathway. The activation of MET also upregulated other immunosuppressive genes (PDCD1LG2 and SOCS1) and transcripts involved in angiogenesis (VEGFA and NRP1) and in cell proliferation. We also report recurrent inactivating mutations in JAK2 that co-occur with alterations in MET and STK11, which prevented the induction of immunoresponse-related genes following treatment with IFNg. Conclusions: We show that MET activation promotes the expression of several negative checkpoint regulators of the immunoresponse, including PD-L1. In addition, we report inactivation of JAK2 in lung cancer cells that prevented the response to IFNg. These alterations are likely to facilitate tumor growth by enabling immune tolerance and may affect the response to immune checkpoint inhibitors

Data

This data was downloaded from GEO (GSE:GSE109720)

Import count data and metadata

library(readr)
library(dplyr)
library(ggplot2)

rawcounts <- read_csv("data/lung_counts.csv")
metadata <-  read_csv("data/lung_metadata.csv")

General information on the data

There are 4 cell types in the dataset

The distribution of samples accross cell types is:

table(metadata$celltype)
## 
##  EBC1 H1573 H1993  H596 
##     9     6     6    12

The number of expressed genes per sample:

vals<- data.frame('expressed_genes' = colSums(rawcounts[,-1]>0))
vals
ggplot(vals, aes(x=expressed_genes)) + geom_histogram(bins=15)

vals<- data.frame('highly_expressed_genes' = colSums(rawcounts[,-1]>1000))
vals
ggplot(vals, aes(x=highly_expressed_genes)) + geom_histogram(bins=15)

Filter the data

treatment1<- 'No treatment'
treatment2<- 'Crizotinib'
celltype0<- 'EBC1'
dex <- metadata$dex
filtered_metadata<- metadata %>% filter(dex==treatment1 | dex==treatment2 ) %>% filter(celltype==celltype0)
filtered_metadata
filtered_samples<- filtered_metadata$id
filtered_counts<- rawcounts %>% select(filtered_samples)
filtered_counts<- bind_cols(rawcounts[,'ensgene'],filtered_counts)
filtered_counts

DESeq2 analysis

library(DESeq2)

Importing data

first we fix the row names:

filtered_countsRN <- data.frame(filtered_counts, row.names=1)
filtered_countsRN
filtered_metadataRN <- data.frame(filtered_metadata, row.names=1)
filtered_metadataRN

now we can construct the DESeq2 dataset

dds <- DESeqDataSetFromMatrix(countData=filtered_countsRN, colData=filtered_metadataRN, design=~dex)
dds
## class: DESeqDataSet 
## dim: 58347 6 
## metadata(1): version
## assays(1): counts
## rownames(58347): ENSG00000223972.5 ENSG00000227232.5 ...
##   ENSG00000278625.1 ENSG00000277374.1
## rowData names(0):
## colnames(6): AE1160 AE1163 ... AE1165 AE1168
## colData names(3): celltype dex geo_id

Run the DESeq pipeline

We run the DESe12 pipline:

dds <- DESeq(dds)

Getting results

first look at the resuts

res <- results(dds, tidy=TRUE)
res <- tbl_df(res)
res

Using a %>%, arrange the results by the adjusted p-value.

res <- res %>%
  arrange(padj)
res
res_1 <- res %>% filter(padj<0.05) %>% nrow()

There 5264 are genes with p-value < 0.05

res_2 <- res %>% filter(padj<0.01) %>% nrow()

There 4001 are genes with p-value < 0.01

res %>%
  filter(padj<0.95) %>%
  write_csv("sigresults.csv")

Data Visualization

Plotting counts

gene <- "ENSG00000146072.6"
gene_name <- "ENSG00000146072.6"
plotCounts(dds, gene=gene, intgroup="dex",returnData=TRUE) %>% ggplot(aes(dex, count)) + geom_boxplot(aes(fill=dex)) + scale_y_log10() + ggtitle(gene_name)

MA & Volcano plots

# Create the new column
res <- res %>% mutate(sig=padj<0.95)

# How many of each?
res %>%
  group_by(sig) %>%
  summarize(n=n())
log2fold_above_2 = res %>% filter(log2FoldChange > 2) %>% nrow()
log2fold_below_minus_2 = res %>% filter(log2FoldChange < -2) %>% nrow()

res %>% ggplot(aes(baseMean, log2FoldChange, col=sig)) + geom_point() + scale_x_log10() + ggtitle("MA plot")

There are 573 with log fold change above 2

There are 654 with log fold change below -2

res %>% ggplot(aes(log2FoldChange, -1*log10(pvalue), col=sig)) + geom_point() + ggtitle("Volcano plot")

Record sessionInfo()

The sessionInfo() prints version information about R and any attached packages. It’s a good practice to always run this command at the end of your R session and record it for the sake of reproducibility in the future.

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Israel.1252  LC_CTYPE=English_Israel.1252   
## [3] LC_MONETARY=English_Israel.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Israel.1252    
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] DESeq2_1.22.1               SummarizedExperiment_1.12.0
##  [3] DelayedArray_0.8.0          BiocParallel_1.16.2        
##  [5] matrixStats_0.54.0          Biobase_2.42.0             
##  [7] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
##  [9] IRanges_2.16.0              S4Vectors_0.20.1           
## [11] BiocGenerics_0.28.0         bindrcpp_0.2.2             
## [13] dplyr_0.7.8                 readr_1.2.1                
## [15] ggplot2_3.1.0               knitr_1.20                 
## 
## loaded via a namespace (and not attached):
##  [1] bit64_0.9-7            jsonlite_1.5           splines_3.5.1         
##  [4] Formula_1.2-3          assertthat_0.2.0       latticeExtra_0.6-28   
##  [7] blob_1.1.1             GenomeInfoDbData_1.2.0 yaml_2.2.0            
## [10] RSQLite_2.1.1          pillar_1.3.0           backports_1.1.2       
## [13] lattice_0.20-35        glue_1.3.0             digest_0.6.18         
## [16] RColorBrewer_1.1-2     XVector_0.22.0         checkmate_1.8.5       
## [19] colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-14         
## [22] plyr_1.8.4             XML_3.98-1.16          pkgconfig_2.0.2       
## [25] genefilter_1.64.0      zlibbioc_1.28.0        xtable_1.8-3          
## [28] purrr_0.2.5            scales_1.0.0           annotate_1.60.0       
## [31] tibble_1.4.2           htmlTable_1.12         withr_2.1.2           
## [34] nnet_7.3-12            lazyeval_0.2.1         survival_2.42-3       
## [37] magrittr_1.5           crayon_1.3.4           memoise_1.1.0         
## [40] evaluate_0.12          foreign_0.8-70         tools_3.5.1           
## [43] data.table_1.11.8      hms_0.4.2              stringr_1.3.1         
## [46] locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1       
## [49] AnnotationDbi_1.44.0   compiler_3.5.1         rlang_0.3.0.1         
## [52] grid_3.5.1             RCurl_1.95-4.11        rstudioapi_0.8        
## [55] htmlwidgets_1.3        bitops_1.0-6           base64enc_0.1-3       
## [58] labeling_0.3           rmarkdown_1.11         gtable_0.2.0          
## [61] codetools_0.2-15       DBI_1.0.0              R6_2.3.0              
## [64] gridExtra_2.3          bit_1.1-14             bindr_0.1.1           
## [67] Hmisc_4.1-1            stringi_1.2.4          Rcpp_1.0.0            
## [70] geneplotter_1.60.0     rpart_4.1-13           acepack_1.4.1         
## [73] tidyselect_0.2.5